Tikrinama hipotezė: Ar pilnatis turi įtakos eismo įvykiams Jungtinėse valstijose 2021 metais
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
pd.set_option('max_rows', 100)
pd.set_option('max_columns', 100)
us_original = pd.read_csv(r"C:\Users\asein\OneDrive\Documents\US_Accidents_Dec21_updated.csv")
us_original.head(5)
ID | Severity | Start_Time | End_Time | Start_Lat | Start_Lng | End_Lat | End_Lng | Distance(mi) | Description | Number | Street | Side | City | County | State | Zipcode | Country | Timezone | Airport_Code | Weather_Timestamp | Temperature(F) | Wind_Chill(F) | Humidity(%) | Pressure(in) | Visibility(mi) | Wind_Direction | Wind_Speed(mph) | Precipitation(in) | Weather_Condition | Amenity | Bump | Crossing | Give_Way | Junction | No_Exit | Railway | Roundabout | Station | Stop | Traffic_Calming | Traffic_Signal | Turning_Loop | Sunrise_Sunset | Civil_Twilight | Nautical_Twilight | Astronomical_Twilight | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | A-1 | 3 | 2016-02-08 00:37:08 | 2016-02-08 06:37:08 | 40.108910 | -83.092860 | 40.112060 | -83.031870 | 3.230 | Between Sawmill Rd/Exit 20 and OH-315/Olentang... | NaN | Outerbelt E | R | Dublin | Franklin | OH | 43017 | US | US/Eastern | KOSU | 2016-02-08 00:53:00 | 42.1 | 36.1 | 58.0 | 29.76 | 10.0 | SW | 10.4 | 0.00 | Light Rain | False | False | False | False | False | False | False | False | False | False | False | False | False | Night | Night | Night | Night |
1 | A-2 | 2 | 2016-02-08 05:56:20 | 2016-02-08 11:56:20 | 39.865420 | -84.062800 | 39.865010 | -84.048730 | 0.747 | At OH-4/OH-235/Exit 41 - Accident. | NaN | I-70 E | R | Dayton | Montgomery | OH | 45424 | US | US/Eastern | KFFO | 2016-02-08 05:58:00 | 36.9 | NaN | 91.0 | 29.68 | 10.0 | Calm | NaN | 0.02 | Light Rain | False | False | False | False | False | False | False | False | False | False | False | False | False | Night | Night | Night | Night |
2 | A-3 | 2 | 2016-02-08 06:15:39 | 2016-02-08 12:15:39 | 39.102660 | -84.524680 | 39.102090 | -84.523960 | 0.055 | At I-71/US-50/Exit 1 - Accident. | NaN | I-75 S | R | Cincinnati | Hamilton | OH | 45203 | US | US/Eastern | KLUK | 2016-02-08 05:53:00 | 36.0 | NaN | 97.0 | 29.70 | 10.0 | Calm | NaN | 0.02 | Overcast | False | False | False | False | True | False | False | False | False | False | False | False | False | Night | Night | Night | Day |
3 | A-4 | 2 | 2016-02-08 06:51:45 | 2016-02-08 12:51:45 | 41.062130 | -81.537840 | 41.062170 | -81.535470 | 0.123 | At Dart Ave/Exit 21 - Accident. | NaN | I-77 N | R | Akron | Summit | OH | 44311 | US | US/Eastern | KAKR | 2016-02-08 06:54:00 | 39.0 | NaN | 55.0 | 29.65 | 10.0 | Calm | NaN | NaN | Overcast | False | False | False | False | False | False | False | False | False | False | False | False | False | Night | Night | Day | Day |
4 | A-5 | 3 | 2016-02-08 07:53:43 | 2016-02-08 13:53:43 | 39.172393 | -84.492792 | 39.170476 | -84.501798 | 0.500 | At Mitchell Ave/Exit 6 - Accident. | NaN | I-75 S | R | Cincinnati | Hamilton | OH | 45217 | US | US/Eastern | KLUK | 2016-02-08 07:53:00 | 37.0 | 29.8 | 93.0 | 29.69 | 10.0 | WSW | 10.4 | 0.01 | Light Rain | False | False | False | False | False | False | False | False | False | False | False | False | False | Day | Day | Day | Day |
us_original.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2845342 entries, 0 to 2845341 Data columns (total 47 columns): # Column Dtype --- ------ ----- 0 ID object 1 Severity int64 2 Start_Time object 3 End_Time object 4 Start_Lat float64 5 Start_Lng float64 6 End_Lat float64 7 End_Lng float64 8 Distance(mi) float64 9 Description object 10 Number float64 11 Street object 12 Side object 13 City object 14 County object 15 State object 16 Zipcode object 17 Country object 18 Timezone object 19 Airport_Code object 20 Weather_Timestamp object 21 Temperature(F) float64 22 Wind_Chill(F) float64 23 Humidity(%) float64 24 Pressure(in) float64 25 Visibility(mi) float64 26 Wind_Direction object 27 Wind_Speed(mph) float64 28 Precipitation(in) float64 29 Weather_Condition object 30 Amenity bool 31 Bump bool 32 Crossing bool 33 Give_Way bool 34 Junction bool 35 No_Exit bool 36 Railway bool 37 Roundabout bool 38 Station bool 39 Stop bool 40 Traffic_Calming bool 41 Traffic_Signal bool 42 Turning_Loop bool 43 Sunrise_Sunset object 44 Civil_Twilight object 45 Nautical_Twilight object 46 Astronomical_Twilight object dtypes: bool(13), float64(13), int64(1), object(20) memory usage: 773.4+ MB
us_original.columns
Index(['ID', 'Severity', 'Start_Time', 'End_Time', 'Start_Lat', 'Start_Lng', 'End_Lat', 'End_Lng', 'Distance(mi)', 'Description', 'Number', 'Street', 'Side', 'City', 'County', 'State', 'Zipcode', 'Country', 'Timezone', 'Airport_Code', 'Weather_Timestamp', 'Temperature(F)', 'Wind_Chill(F)', 'Humidity(%)', 'Pressure(in)', 'Visibility(mi)', 'Wind_Direction', 'Wind_Speed(mph)', 'Precipitation(in)', 'Weather_Condition', 'Amenity', 'Bump', 'Crossing', 'Give_Way', 'Junction', 'No_Exit', 'Railway', 'Roundabout', 'Station', 'Stop', 'Traffic_Calming', 'Traffic_Signal', 'Turning_Loop', 'Sunrise_Sunset', 'Civil_Twilight', 'Nautical_Twilight', 'Astronomical_Twilight'], dtype='object')
us_original.isna().sum()
ID 0 Severity 0 Start_Time 0 End_Time 0 Start_Lat 0 Start_Lng 0 End_Lat 0 End_Lng 0 Distance(mi) 0 Description 0 Number 1743911 Street 2 Side 0 City 137 County 0 State 0 Zipcode 1319 Country 0 Timezone 3659 Airport_Code 9549 Weather_Timestamp 50736 Temperature(F) 69274 Wind_Chill(F) 469643 Humidity(%) 73092 Pressure(in) 59200 Visibility(mi) 70546 Wind_Direction 73775 Wind_Speed(mph) 157944 Precipitation(in) 549458 Weather_Condition 70636 Amenity 0 Bump 0 Crossing 0 Give_Way 0 Junction 0 No_Exit 0 Railway 0 Roundabout 0 Station 0 Stop 0 Traffic_Calming 0 Traffic_Signal 0 Turning_Loop 0 Sunrise_Sunset 2867 Civil_Twilight 2867 Nautical_Twilight 2867 Astronomical_Twilight 2867 dtype: int64
us_new = us_original[['ID', 'Severity', 'Start_Time', 'Start_Lat', 'Start_Lng', 'Side', 'City', 'County', 'State', 'Timezone']]
us_new = us_new.rename(columns={"Start_Lat":"Latitude","Start_Lng":"Longitude","Start_Time":"Event_Time"})
us_new.head()
ID | Severity | Event_Time | Latitude | Longitude | Side | City | County | State | Timezone | |
---|---|---|---|---|---|---|---|---|---|---|
0 | A-1 | 3 | 2016-02-08 00:37:08 | 40.108910 | -83.092860 | R | Dublin | Franklin | OH | US/Eastern |
1 | A-2 | 2 | 2016-02-08 05:56:20 | 39.865420 | -84.062800 | R | Dayton | Montgomery | OH | US/Eastern |
2 | A-3 | 2 | 2016-02-08 06:15:39 | 39.102660 | -84.524680 | R | Cincinnati | Hamilton | OH | US/Eastern |
3 | A-4 | 2 | 2016-02-08 06:51:45 | 41.062130 | -81.537840 | R | Akron | Summit | OH | US/Eastern |
4 | A-5 | 3 | 2016-02-08 07:53:43 | 39.172393 | -84.492792 | R | Cincinnati | Hamilton | OH | US/Eastern |
us_new["Event_Time"] = pd.to_datetime(us_new["Event_Time"])
us_new.shape
(2845342, 10)
# Parodo, kad daugiau nei 50% duomenų sudaro iš 2021 metų laikotarpio.
us_new["Event_Time"].dt.year.value_counts(normalize=True)
2021 0.531305 2020 0.219961 2019 0.090891 2017 0.057609 2018 0.057348 2016 0.042886 Name: Event_Time, dtype: float64
us2021 = us_new[(us_new["Event_Time"] >= "2021-01-01") & (us_new["Event_Time"] < "2022-01-01")]
us2021.isna().sum()
ID 0 Severity 0 Event_Time 0 Latitude 0 Longitude 0 Side 0 City 61 County 0 State 0 Timezone 1588 dtype: int64
us2021.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1511745 entries, 224945 to 2069159 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 1511745 non-null object 1 Severity 1511745 non-null int64 2 Event_Time 1511745 non-null datetime64[ns] 3 Latitude 1511745 non-null float64 4 Longitude 1511745 non-null float64 5 Side 1511745 non-null object 6 City 1511684 non-null object 7 County 1511745 non-null object 8 State 1511745 non-null object 9 Timezone 1510157 non-null object dtypes: datetime64[ns](1), float64(2), int64(1), object(6) memory usage: 126.9+ MB
#Koordinates pakeičiama į str tipą, kad vėliau būtų galima naudoti "location" funkcijoje.
us2021["Latitude"] = us2021["Latitude"].astype(str)
us2021["Longitude"] = us2021["Longitude"].astype(str)
us2021.info()
C:\Users\asein\AppData\Local\Temp/ipykernel_3516/1686928660.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy us2021["Latitude"] = us2021["Latitude"].astype(str) C:\Users\asein\AppData\Local\Temp/ipykernel_3516/1686928660.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy us2021["Longitude"] = us2021["Longitude"].astype(str)
<class 'pandas.core.frame.DataFrame'> Int64Index: 1511745 entries, 224945 to 2069159 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 1511745 non-null object 1 Severity 1511745 non-null int64 2 Event_Time 1511745 non-null datetime64[ns] 3 Latitude 1511745 non-null object 4 Longitude 1511745 non-null object 5 Side 1511745 non-null object 6 City 1511684 non-null object 7 County 1511745 non-null object 8 State 1511745 non-null object 9 Timezone 1510157 non-null object dtypes: datetime64[ns](1), int64(1), object(8) memory usage: 126.9+ MB
pip install geopy
Requirement already satisfied: geopy in c:\users\asein\anaconda3\lib\site-packages (2.2.0) Requirement already satisfied: geographiclib<2,>=1.49 in c:\users\asein\anaconda3\lib\site-packages (from geopy) (1.52) Note: you may need to restart the kernel to use updated packages.
#Pagal koordinates ištraukiama mietos pavadinimas, jei neranda gražina Nan.
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="geoapiExercises")
def location(Latitude, Longitude):
location = geolocator.reverse(f'{Latitude},{Longitude}')
address = location.raw['address']
town = address.get('town', '')
city = address.get('city', '')
if city != '':
result = city
elif town != '':
result = town
else:
result = np.nan
return result
#Susikuriamas naujas dataframe kur nėra miestų
us2021_city_na = us2021[us2021["City"].isna()]
us2021.dropna(inplace=True)
C:\Users\asein\anaconda3\lib\site-packages\pandas\util\_decorators.py:311: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return func(*args, **kwargs)
us2021_city_na["New_City"] = us2021_city_na.apply(lambda x:location(x["Latitude"],x['Longitude']), axis=1)
C:\Users\asein\AppData\Local\Temp/ipykernel_3516/2543745419.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy us2021_city_na["New_City"] = us2021_city_na.apply(lambda x:location(x["Latitude"],x['Longitude']), axis=1)
us2021_city_na["City"].fillna(us2021_city_na["New_City"], inplace=True)
C:\Users\asein\anaconda3\lib\site-packages\pandas\core\generic.py:6392: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return self._update_inplace(result)
us2021_city_na.drop('New_City', axis=1, inplace=True)
C:\Users\asein\anaconda3\lib\site-packages\pandas\core\frame.py:4906: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return super().drop(
us2021_city_na.dropna(inplace=True)
us2021 = us2021.append(us2021_city_na)
us2021
ID | Severity | Event_Time | Latitude | Longitude | Side | City | County | State | Timezone | |
---|---|---|---|---|---|---|---|---|---|---|
224945 | A-224946 | 2 | 2021-03-10 19:57:00 | 42.382359 | -71.022318 | R | East Boston | Suffolk | MA | US/Eastern |
224946 | A-224947 | 2 | 2021-07-30 23:37:00 | 40.908676 | -123.70711599999998 | L | Salyer | Humboldt | CA | US/Pacific |
224947 | A-224948 | 2 | 2021-10-15 16:42:36 | 38.97062800000001 | -77.116333 | R | Bethesda | Montgomery | MD | US/Eastern |
224948 | A-224949 | 2 | 2021-12-21 11:42:00 | 47.642651 | -122.31859 | R | Seattle | King | WA | US/Pacific |
224949 | A-224950 | 2 | 2021-12-09 08:51:00 | 37.419536 | -122.09064 | R | Mountain View | Santa Clara | CA | US/Pacific |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1501812 | A-1501813 | 2 | 2021-05-07 05:33:30 | 38.912285 | -76.934168 | R | Washington | District of Columbia | DC | US/Eastern |
1521239 | A-1521240 | 2 | 2021-02-10 13:30:00 | 38.912285 | -76.934168 | R | Washington | District of Columbia | DC | US/Eastern |
1588891 | A-1588892 | 2 | 2021-01-11 16:09:00 | 39.72126 | -77.735338 | R | Antrim Township | Washington | MD | US/Eastern |
1598427 | A-1598428 | 2 | 2021-02-18 00:31:52 | 38.912285 | -76.934168 | R | Washington | District of Columbia | DC | US/Eastern |
1617075 | A-1617076 | 2 | 2021-02-04 13:34:00 | 38.912285 | -76.934168 | R | Washington | District of Columbia | DC | US/Eastern |
1510143 rows × 10 columns
#Konvertuoja US laika į CET
def Timezone_to_CET(row):
if row["Timezone"] == "US/Eastern":
return row["Event_Time"] - pd.DateOffset(hours=6)
elif row["Timezone"] == "US/Pacific":
return row["Event_Time"] - pd.DateOffset(hours=9)
elif row["Timezone"] == "US/Central":
return row["Event_Time"] - pd.DateOffset(hours=7)
elif row["Timezone"] == "US/Mountain":
return row["Event_Time"] - pd.DateOffset(hours=9)
else:
return np.nan
us2021 = us2021.assign(CET_Time=us2021.apply(Timezone_to_CET, axis=1))
us2021.head(5)
ID | Severity | Event_Time | Latitude | Longitude | Side | City | County | State | Timezone | CET_Time | |
---|---|---|---|---|---|---|---|---|---|---|---|
224945 | A-224946 | 2 | 2021-03-10 19:57:00 | 42.382359 | -71.022318 | R | East Boston | Suffolk | MA | US/Eastern | 2021-03-10 13:57:00 |
224946 | A-224947 | 2 | 2021-07-30 23:37:00 | 40.908676 | -123.70711599999998 | L | Salyer | Humboldt | CA | US/Pacific | 2021-07-30 14:37:00 |
224947 | A-224948 | 2 | 2021-10-15 16:42:36 | 38.97062800000001 | -77.116333 | R | Bethesda | Montgomery | MD | US/Eastern | 2021-10-15 10:42:36 |
224948 | A-224949 | 2 | 2021-12-21 11:42:00 | 47.642651 | -122.31859 | R | Seattle | King | WA | US/Pacific | 2021-12-21 02:42:00 |
224949 | A-224950 | 2 | 2021-12-09 08:51:00 | 37.419536 | -122.09064 | R | Mountain View | Santa Clara | CA | US/Pacific | 2021-12-08 23:51:00 |
import mysql.connector
mydb = mysql.connector.connect(
host="127.0.0.1",
port="3306",
user="root",
password="root"
)
cursor = mydb.cursor()
cursor.execute("USE my_db")
cursor.execute("SHOW TABLES")
for i in cursor:
print(i)
('full_moon',) ('moon_data',) ('us_accidents',)
moon = pd.read_sql("SELECT * FROM my_db.moon_data", con=mydb)
moon
Date | Time | |
---|---|---|
0 | 15 January 1900 | 08:07:30 pm |
1 | 14 February 1900 | 02:50:12 pm |
2 | 16 March 1900 | 09:11:48 am |
3 | 15 April 1900 | 02:02:06 am |
4 | 14 May 1900 | 04:36:36 pm |
... | ... | ... |
1863 | 1 September 2050 | 10:30:54 am |
1864 | 30 September 2050 | 06:31:48 pm |
1865 | 30 October 2050 | 04:16:00 am |
1866 | 28 November 2050 | 04:09:48 pm |
1867 | 28 December 2050 | 06:15:36 am |
1868 rows × 2 columns
moon.columns = moon.columns.str.strip()
moon["Date_Time"] = moon["Date"] + moon["Time"]
moon.head(5)
Date | Time | Date_Time | |
---|---|---|---|
0 | 15 January 1900 | 08:07:30 pm | 15 January 1900 08:07:30 pm |
1 | 14 February 1900 | 02:50:12 pm | 14 February 1900 02:50:12 pm |
2 | 16 March 1900 | 09:11:48 am | 16 March 1900 09:11:48 am |
3 | 15 April 1900 | 02:02:06 am | 15 April 1900 02:02:06 am |
4 | 14 May 1900 | 04:36:36 pm | 14 May 1900 04:36:36 pm |
moon[moon['Date_Time'].str.contains("]")].count()
Date 23 Time 23 Date_Time 23 dtype: int64
moon["Date_Time"] = moon["Date_Time"].str.split("[").str[0].str.replace("m ", "m")
moon["Date_Time"] = pd.to_datetime(moon['Date_Time'])
moon.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1868 entries, 0 to 1867 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 1868 non-null object 1 Time 1868 non-null object 2 Date_Time 1868 non-null datetime64[ns] dtypes: datetime64[ns](1), object(2) memory usage: 43.9+ KB
us2021["Full_moon_CET"] = us2021["CET_Time"].dt.date.isin(moon["Date_Time"].dt.date)
# Duomenys PowerBI vizualizacijai
us2021.to_csv("PowerBI_us_2021_data.csv")
us2021.head(5)
ID | Severity | Event_Time | Latitude | Longitude | Side | City | County | State | Timezone | CET_Time | Full_moon_CET | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
224945 | A-224946 | 2 | 2021-03-10 19:57:00 | 42.382359 | -71.022318 | R | East Boston | Suffolk | MA | US/Eastern | 2021-03-10 13:57:00 | False |
224946 | A-224947 | 2 | 2021-07-30 23:37:00 | 40.908676 | -123.70711599999998 | L | Salyer | Humboldt | CA | US/Pacific | 2021-07-30 14:37:00 | False |
224947 | A-224948 | 2 | 2021-10-15 16:42:36 | 38.97062800000001 | -77.116333 | R | Bethesda | Montgomery | MD | US/Eastern | 2021-10-15 10:42:36 | False |
224948 | A-224949 | 2 | 2021-12-21 11:42:00 | 47.642651 | -122.31859 | R | Seattle | King | WA | US/Pacific | 2021-12-21 02:42:00 | False |
224949 | A-224950 | 2 | 2021-12-09 08:51:00 | 37.419536 | -122.09064 | R | Mountain View | Santa Clara | CA | US/Pacific | 2021-12-08 23:51:00 | False |
us2021['Full_moon_CET'].value_counts(normalize=True)
False 0.966549 True 0.033451 Name: Full_moon_CET, dtype: float64
# Išskiriami įvykiai pilanties metu
us_2021_FullMoon = us2021[us2021["Full_moon_CET"]]
us_2021_FullMoon.head(5)
ID | Severity | Event_Time | Latitude | Longitude | Side | City | County | State | Timezone | CET_Time | Full_moon_CET | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
225093 | A-225094 | 2 | 2021-12-19 15:15:00 | 38.555563 | -121.441143 | R | Sacramento | Sacramento | CA | US/Pacific | 2021-12-19 06:15:00 | True |
225145 | A-225146 | 2 | 2021-10-21 06:06:00 | 34.076742 | -118.230768 | R | Los Angeles | Los Angeles | CA | US/Pacific | 2021-10-20 21:06:00 | True |
225159 | A-225160 | 2 | 2021-12-19 17:22:00 | 39.275191 | -121.015845 | R | Nevada City | Nevada | CA | US/Pacific | 2021-12-19 08:22:00 | True |
225179 | A-225180 | 2 | 2021-07-24 21:47:00 | 28.449998 | -81.469499 | R | Orlando | Orange | FL | US/Eastern | 2021-07-24 15:47:00 | True |
225230 | A-225231 | 2 | 2021-11-19 19:23:07 | 44.495381 | -93.870447 | R | Le Sueur | Le Sueur | MN | US/Central | 2021-11-19 12:23:07 | True |
# Išskiriami įvykiai ne pilnaties metu
us_2021_NonFullMoon = us2021[us2021["Full_moon_CET"] == False]
us_2021_NonFullMoon.head(5)
ID | Severity | Event_Time | Latitude | Longitude | Side | City | County | State | Timezone | CET_Time | Full_moon_CET | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
224945 | A-224946 | 2 | 2021-03-10 19:57:00 | 42.382359 | -71.022318 | R | East Boston | Suffolk | MA | US/Eastern | 2021-03-10 13:57:00 | False |
224946 | A-224947 | 2 | 2021-07-30 23:37:00 | 40.908676 | -123.70711599999998 | L | Salyer | Humboldt | CA | US/Pacific | 2021-07-30 14:37:00 | False |
224947 | A-224948 | 2 | 2021-10-15 16:42:36 | 38.97062800000001 | -77.116333 | R | Bethesda | Montgomery | MD | US/Eastern | 2021-10-15 10:42:36 | False |
224948 | A-224949 | 2 | 2021-12-21 11:42:00 | 47.642651 | -122.31859 | R | Seattle | King | WA | US/Pacific | 2021-12-21 02:42:00 | False |
224949 | A-224950 | 2 | 2021-12-09 08:51:00 | 37.419536 | -122.09064 | R | Mountain View | Santa Clara | CA | US/Pacific | 2021-12-08 23:51:00 | False |
import seaborn as sns
# Bus naudojami grafikų X ašiai
Months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
Days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(15,4))
sns.countplot(x=us_2021_NonFullMoon['Event_Time'].dt.day_name().str[:3], data=us_2021_NonFullMoon, order=Days, ax=ax1, color = '#5FD4EF')
ax1.set(ylabel=None, xlabel=None)
ax1.set_title("Accidents by Weekday during Non Full Moon")
sns.countplot(x=us_2021_FullMoon['Event_Time'].dt.day_name().str[:3], data=us_2021_FullMoon, order=Days, ax=ax2, color= '#0A536A')
ax2.set(ylabel=None, xlabel=None)
ax2.set_title("Accidents by Weekday during Full Moon")
Text(0.5, 1.0, 'Accidents by Weekday during Full Moon')
Tiek pilnaties, tiek nepilnaties metu daugiausiai įvykių penktadieniais. Galima teigti, kad penktadienis pavojingiausia diena vairuotojams.
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(15,4))
sns.countplot(x=us_2021_NonFullMoon['Event_Time'].dt.month_name().str[:3], data=us_2021_NonFullMoon, order=Months, ax=ax1,color = '#5FD4EF')
ax1.set(ylabel=None, xlabel=None)
ax1.set_title("Accidents by Month during Non Full Moon")
sns.countplot(x=us_2021_FullMoon['Event_Time'].dt.month_name().str[:3], data=us_2021_FullMoon, order=Months, ax=ax2,color= '#0A536A')
ax2.set(ylabel=None, xlabel=None)
ax2.set_title("Accidents by Month during Full Moon")
Text(0.5, 1.0, 'Accidents by Month during Full Moon')
Bendrai įvykių kiekis išauga žiemos sezonu. Pilnaties metu ypač išsiskiria lapkritis.
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(15,4))
sns.countplot(x=us_2021_NonFullMoon['Event_Time'].dt.hour, data=us_2021_NonFullMoon, ax=ax1,color = '#5FD4EF')
ax1.set(ylabel=None, xlabel="Hours")
ax1.set_title("Accidents by Hour during Non Full Moon")
sns.countplot(x=us_2021_FullMoon['Event_Time'].dt.hour, data=us_2021_FullMoon, ax=ax2,color= '#0A536A')
ax2.set(ylabel=None, xlabel="Hours")
ax2.set_title("Accidents by Hour during Full Moon")
Text(0.5, 1.0, 'Accidents by Hour during Full Moon')
Žiūrint į valandinį pasiskirystymą matosi beveik įdentiškas išsidėstymas, dominuoja darbo pradžios ir pabaigos valandos, tai įrodo eismo piką, nepaisant ar pilnatis ar ne.
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(15,4))
sns.countplot(x=us_2021_NonFullMoon['Event_Time'].dt.day, data=us_2021_NonFullMoon, ax=ax1,color = '#5FD4EF')
ax1.set(ylabel=None, xlabel="Hours")
ax1.set_title("Accidents by day during Non Full Moon")
sns.countplot(x=us_2021_FullMoon['Event_Time'].dt.day, data=us_2021_FullMoon, ax=ax2,color= '#0A536A')
ax2.set(ylabel=None, xlabel="Hours")
ax2.set_title("Accidents by day during Full Moon")
Text(0.5, 1.0, 'Accidents by day during Full Moon')
Dienų paskirstymas per daug nieko nesakantis, nėra kažkokių akivaizdžių išsišokimų. Pilnaties dienos parodo, 2021 metais pilnas būdavo mėnesio gale.
Accicent_per_day_nonFullMoon =us_2021_NonFullMoon.groupby([us_2021_NonFullMoon["Event_Time"].dt.month,us_2021_NonFullMoon["Event_Time"].dt.day])["ID"].count()
Accicent_per_day_nonFullMoon = Accicent_per_day_nonFullMoon.groupby("Event_Time")
Accicent_per_day_FullMoon = us_2021_FullMoon.groupby([us_2021_FullMoon["Event_Time"].dt.month,us_2021_FullMoon["Event_Time"].dt.day])["ID"].count()
Accicent_per_day_FullMoon = Accicent_per_day_FullMoon.groupby("Event_Time")
plt.figure(figsize=(10, 5))
plt.style.use('default')
X_axis = np.arange(len(Months))
plt.bar(X_axis - 0.2, Accicent_per_day_FullMoon.sum(), width = 0.4, label = "Full moon",color = '#0A536A')
plt.bar(X_axis + 0.2, Accicent_per_day_nonFullMoon.mean(), width = 0.4, label = "Non full moon",color= '#5FD4EF')
plt.xticks(X_axis, Months)
plt.legend()
plt.title("Average number of accidents per day")
Text(0.5, 1.0, 'Average number of accidents per day')
Net 8 mėnesiai, kai pilnaties dieną įvykusių įvykių skaičius lenkia likusių mėnesio dienų vidurkį. Ypač išsiskiria lapkritis.
plt.figure(figsize=(10, 5))
#plt.style.use("fivethirtyeight")
X_axis = np.arange(len(Months))
plt.bar(X_axis - 0.2, Accicent_per_day_FullMoon.sum(), width = 0.4, label = "Full moon",color = '#0A536A')
plt.bar(X_axis + 0.2, Accicent_per_day_nonFullMoon.max(), width = 0.4, label = "Non full moon",color= '#5FD4EF')
plt.xticks(X_axis, Months)
plt.legend()
plt.title("Max number of accidents per day in month")
Text(0.5, 1.0, 'Max number of accidents per day in month')
Žiurint į daugiausiai dienos įvykių per mėnesį, vaizdas visai kitas, nei su mėnesių vidurkiais. Lapkritį per pilnatį buvo daugiau įvykių.
# Metinis dienos įvykių vidurkis ne per pilnatį
Yearly_nonFullMoonAVG = us_2021_NonFullMoon.groupby([us_2021_NonFullMoon["Event_Time"].dt.month,us_2021_NonFullMoon["Event_Time"].dt.day])["ID"].count().mean()
Yearly_nonFullMoonAVG
4021.0110192837465
# Metinis dienos įvykių vidurkis per pilnatį
Yearly_FullMoonAVG = Accicent_per_day_FullMoon.sum().mean()
Yearly_FullMoonAVG
4209.666666666667
# 4,7% daugiau įvykių įvyksta pilnaties metu.
(Yearly_FullMoonAVG/Yearly_nonFullMoonAVG-1)*100
4.691746590053492
us_2021_FullMoon["Severity"].value_counts(normalize=True)
2 0.983965 4 0.016035 Name: Severity, dtype: float64
us_2021_NonFullMoon["Severity"].value_counts(normalize=True)
2 0.98411 4 0.01589 Name: Severity, dtype: float64
Buvo tikrinama ar pilnatis turi įtakos eismo įvykiams Jungitnėse valstijose. Nors pilnaties metu 2021 metais įvyko 4,7% daugiau eismo įvykių lyginant su kitomis dienomis, sunku įrodyti ar tai tikrai pilnaties efektas, reiktų daugiau faktorių tikrinti ar lokacijų. Kadangi mėnulio fazė buvo europos laiku, konvertuojant dalis įvykių per pilnatį pasiskirtė per kelias dienas, tai galimai galėjo paveiktį galutinį rezultatą.
Visgi hipotezė nei paneigta, nei įrodyta.